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Abstract. We analyse the computational complexity of finding Nash 
equilibria in stochastic multiplayer games with co-regular objectives. 
While the existence of an equilibrium whose payoff falls into a certain 
interval may be undecidable, we single out several decidable restrictions 
of the problem. First, restricting the search space to stationary, or pure 
stationary, equilibria results in problems that are typically contained in 
PSPace and NP, respectively. Second, we show that the existence of an 
equilibrium with a binary payoff (i.e. an equilibrium where each player 
either wins or loses with probability 1) is decidable. We also establish that 
the existence of a Nash equilibrium with a certain binary payoff entails 
the existence of an equilibrium with the same payoff in pure, finite-state 
strategies. 

1 Introduction 

We study stochastic games 1* 22) played by multiple players on a finite, directed 
graph. Intuitively, a play of such a game evolves by moving a token along 
edges of the graph: Each vertex of the graph is either controlled by one of 
the players, or it is stochastic. Whenever the token arrives at a non-stochastic 
vertex, the player who controls this vertex must move the token to a successor 
vertex; when the token arrives at a stochastic vertex, a fixed probability 
distribution determines the next vertex. A measurable function maps plays 
to payoffs. In the simplest case, which we discuss here, the possible payoffs 
of a single play are binary (i.e. each player either wins or loses a given play). 
However, due to the presence of stochastic vertices, a player's expected payoff 
(i.e. her probability of winning) can be an arbitrary probability. 

Stochastic games with o^-regular objectives have been successfully ap- 
plied in the verification and synthesis of reactive systems under the influence 
of random events. Such a system is usually modelled as a game between 
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the system and its environment, where the environment's objective is the 
complement of the system's objective: the environment is considered hostile. 
Therefore, the research in this area has traditionally focused on two-player 
games where each play is won by precisely one of the two players, so-called 
two-player zero-sum games. However, the system may comprise of several com- 
ponents with independent objectives, a situation which is naturally modelled 
by a multiplayer game. 

The most common interpretation of rational behaviour in multiplayer 
games is captured by the notion of a Nash equilibrium 1|2TI . In a Nash equi- 
librium, no player can improve her payoff by unilaterally switching to a 
different strategy. Chatterjee et al. [7| gave an algorithm for computing a 
Nash equilibrium in a stochastic multiplayer games with a;-regular winning 
conditions. We argue that this is not satisfactory. Indeed, it can be shown that 
their algorithm may compute an equilibrium where all players lose almost 
surely (i.e. receive expected payoff 0), while there exist other equilibria where 
all players win almost surely (i.e. receive expected payoff 1). 

In applications, one might look for an equilibrium where as many players 
as possible win almost surely or where it is guaranteed that the expected 
payoff of the equilibrium falls into a certain interval. Formulated as a decision 
problem, we want to know, given a A:-player game Q with initial vertex cq arid 
two thresholds x, y G [0,1]*^, whether {Q,Vq) has a Nash equilibrium with 
expected payoff at least x and at most y. This problem, which we call NE for 
short, is a generalisation of the quantitative decision problem for two-player zero- 
sum games, which asks whether in such a game player has a strategy that 
ensures to win the game with a probability that lies above a given threshold. 

In this paper, we analyse the decidability of NE for games with a;-regular 
objectives. Although the decidability of NE remains open, we can show that 
several restrictions of NE are decidable: First, we show that NE becomes 
decidable when one restricts the search space to equilibria in positional (i.e. 
pure, stationary), or stationary, strategies, and that the resulting decision 
problems typically lie in NP and PSPace, respectively (e.g. if the objectives 
are specified as Muller conditions). Second, we show that the following 
qualitative version of NE is decidable: Given a A:-player game Q with initial 
vertex vq and a binary payoff x e {0, \}^, decide whether {Q,vo) has a Nash 
equilibrium with expected payoff x. Moreover, we prove that, depending on 
the representation of the objective, this problem is typically complete for one 
of the complexity classes P, NP, coNP and PSPace, and that the problem is 
invariant under restricting the search space to equilibria in pure, finite-state 
strategies. 

Our results have to be viewed in light of the (mostly) negative results 
we derived in [,27]. In particular, it was shown in [27J that NE becomes 
tindecidable if one restricts the search space to equilibria in pure strategies (as 
opposed to equilibria in possibly mixed strategies), even for simple stochastic 
multiplayer games. These are games with simple reachability objectives. The 
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undecidability result crucially makes use of the fact that the Nash equilibrium 
one is looking for can have a payoff that is not binary. Hence, this result 
cannot be applied to the qualitative version of NE, which we show to be 
decidable in this paper. It was also proven in ETl that the problems that 
arise from NE when one restricts the search space to equilibria in positional 
or stationary strategies are both NP-hard. Moreover, we showed that the 
restriction to stationary strategies is at least as hard as the problem SqrtSum 
im , a problem which is not known to lie inside the polynomial hierarchy. 
This demonstrates that the upper bounds we prove for these problems in this 
paper will be hard to improve. 

Related Work. Determining the complexity of Nash equilibria has attracted 
much interest in recent years. In particular, a series of papers culminated 
in the result that computing a Nash equilibrium of a two-player game in 
strategic form is complete for the complexity class PPAD fT?, "Sl. However, 
the work closest to ours is |26|, where the decidability of (a variant of) the 
qualitative version of NE in infinite games without stochastic vertices was 
proven. Our results complement the results in that paper, and although our 
decidability proof for the qualitative setting is structurally similar to the one 
in [26], the presence of stochastic vertices makes the proof substantially more 
challenging. 

Another subject that is related to the study of stochastic multiplayer 
games are Markov decision processes with multiple objectives. These can be 
viewed as stochastic multiplayer games where all non-stochastic vertices are 
controlled by a single player. For o^-regular objectives, Etessami et al. |I T6l 
proved the decidability of NE for these games. Due to the different nature of 
the restrictions, this result is incomparable to our results. 

2 Preliminaries 

The model of a (two-player zero-sum) stochastic game |9| easily generalises to 
the multiplayer case: Formally, a stochastic multiplayer game (SMG) is a tuple 
g = (n,y,(y,)/en,4(Win,),en) where 

• n is a finite set of players (usually Tl = {0, 1, . . . , — 1}); 

• y is a finite, non-empty set of vertices; 

• Vi<ZV and n = for each f 7^ e H; 

• AQ y X ([0, 1] U {_L}) X y is the transition relation; 

• Win, C is a Borel set for each / e Tl. 

The structure G = {V, {Vi)i^ii,A) is called the arena of Q, and Win, is called 
the objective, or the winning condition, of player f e H. A vertex v & V is 
controlled by player iiiv ^ Vi and a stochastic vertex if v ^ Uien ^i- 

We require that a transition is labelled by a probability iff it originates 
in a stochastic vertex: If {v,p,w) e A then p e [0,1] if z; is a stochas- 
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tic vertex and p — -L if v G Vj for some / € 17. Additionally, for each 
pair of a stochastic vertex v and an arbitrary vertex w, we reqmre that 
there exists precisely one p e [0,1] such that {v,p,w) e A. Moreover, 
for each stochastic vertex v, the outgoing probabilities must sum up to 
1- Yj{p,w):(v,p,w)eAV ~ 1- Finally, we require that for each vertex the set 
vA :— {w e V : exists p e (0,1] U {±} with {v,p,w) e A} is non-empty, i.e. 
every vertex has at least one successor. 

A special class of SMGs are two-player zero-sum stochastic games (2SGs). 
These are SMGs played by only two players (player and player 1) and 
one player's objective is the complement of the other player's objective, i.e. 
Wino = \ Wini. An even more restricted model are one-player stochastic 
games, also known as Markov decision processes (MDPs), where there is only 
one player (player 0). Finally, Markov chains are SMGs with no players at all, 
i.e. there are only stochastic vertices. 

Strategies and strategy profiles. In the following, let Q be an arbitrary 
SMG. A (mixed) strategy of player i in ^ is a mapping a : V*Vi I?(V) 
assigning to each possible history xv e V*Vi of vertices ending in a vertex 
controlled by player i a (discrete) probability distribution over V such that 
a{xv){w) > only if {v,±,w) e A. Instead of (t{xv){w), we usually write 
(/(if I xv). A (mixed) strategy profile of Q is a tuple a = {ai)i^u where cr,- is a 
strategy of player i in Q. Given a strategy profile a — {aj)j^u and a strategy t 
of player i, we denote by (a"-;, t) the strategy profile resulting from a by 
replacing cr, with x. 

A strategy a of player i is called pure if for each xv e V* Vi there exists 
w & vA with a(w \ xv) = 1. Note that a pure strategy of player i can be 
identified with a function a : V*Vi V. A strategy profile a — {(Ji)i^u is 
called pure if each cr,- is pure. 

A strategy a of player i in Q is called stationary if a depends only on the 
current vertex: a{xv) — <j{v) for all xv e V*Vi. Hence, a stationary strategy 
of player i can be identified with a function a -.Vi ^ 'D{V). A strategy profile 
a — {o'i)i^n of Q is called stationary if each is stationary. 

We call a pure, stationary strategy a positional strategy and a strategy pro- 
file consisting of positional strategies only a positional strategy profile. Clearly, 
a positional strategy of player i can be identified with a function u : V, ^ V . 
More generally, a pure strategy a is caWed finite-state if it can be implemented 
by a finite automaton with output or, equivalently, if the equivalence relation 
~ C y* X y* defined by 3C ~ y if a{xz) — cr{yz) for all z G V*Vi has only 
finitely many equivalence classes. Finally, a finite-state strategy profile is a 
profile consisting of finite-state strategies only. 

It is sometimes convenient to designate an initial vertex vq & V of the 
game. We call the tuple {Q,vo) an initialised SMG. A strategy (strategy profile) 
of {Q,vo) is just a strategy (strategy profile) of Q. In the following, we will 
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use the abbreviation SMG also for initialised SMGs. It should always be clear 
from the context if the game is initialised or not. 

Given an initial vertex vq and a strategy profile a = {<Tj)i^f[, the condi- 
tional probability ofw& V given the history xv & V*V is the number (7;(iy | xv) 
if Z7 S Vj and the unique p G [0, 1] such that (c, p,w) & Aii v is a stochastic 
vertex. We abuse notation and denote this probability by d'{w \ xv). The 
probabilities cr{zv \ xv) induce a probability measure on the space in 
the following way: The probability of a basic open set Ci . . . Cj. ■ is if 
Vi Vq and the product of the probabilities (7{vj | ci . . . cy-i) for ; = 2, . . . ,A: 
otherwise. It is a classical result of measure theory that this extends to a 
unique probability measure assigning a probability to every Borel subset of 
V^', which we denote by Pr^^j . 

Given a strategy a and a sequence x e V*, we define the residual strategy 
cr[x] by (7[x](i/i7) = a{xyv). It a = {o'i)i:=n is a strategy profile, then the 
residual strategy profile (7[x] is just the profile of the residual strategies cr/[x]. 
The following two lemmas are taken from 1281 . 

Lemma 1. Let a and t be two strategy profiles of Q, equal over a prefix- 
closed set X C y*. Then Frf,^{B) = Pr^g(B) for every Borel set B all of whose 
prefixes belong to X. 

Lemma 2. Let a be any strategy profile of Q, xv E V*V a history of Q, and 
B C a Borel set. Then Fv^^iB n xv ■ V") = Pr^^(xi; ■ V^) ■ Pr^f''^(B[x]), 
where B[x] ■- {a e V"^ : xa e B}. 

For a strategy profile a, we are mainly interested in the probabilities 
Pi := Prp|j(Win/) of winning. We call p,- the (expected) payoff of a for player i 
and the vector {pi)j^n the (expected) payoff of a. 

SuBARENAS AND END COMPONENTS. Given an SMG G, we call a set U C y a 
subarena of G it 1. U ^ (D; 2. vAnU ^ & for each v e U, and 3. vA C U for 
each stochastic vertex v E U. 

A set C C y is called an end component of G it C is a subarena, and 
additionally C is strongly connected: for every pair of vertices v,w E C 
there exists a sequence v = vi,V2, ■ ■ ■ ,v„ = w with V{j^\ G V\A for each 
< f < M. An end component C is maximal in asetXl it there is no end 
component C C Lf with C C C'. For any subset Lf C V, the set of all end 
components maximal in Lf can be computed by standard graph algorithms in 
quadratic time (see e.g. |13|). 

The central fact about end components is that, under any strategy profile, 
the set of vertices visited infinitely often is almost surely an end component. 
For an infinite sequence a, we denote by Inf (a) the set of elements occurring 
infinitely often in a. 

Lemma 3 ( lUSllTOl ). Let G be any SMG, and let a be any strategy profile of G- 
Then Pr^({a e : Inf(a) is an end component}) = 1 for each vertex v eV . 
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Moreover, for any end component C, we can construct a stationary 
strategy profile a that, when started in C, guarantees to visit all (and only) 
vertices in C infinitely often. 

Lemma 4 ( |[T3l ITTl ). Let Q be any SMG, and let C be any end component 
of G. There exists a stationary strategy profile with Pr^({a e : Inf(a) = 
C}) — 1 for each vertex v E C. 

Values, determinacy and optimal strategies. Given a strategy t of 
player i in Q and a vertex c £ V, the value of T /ro»i v is the number 
val^(c) := inf^ Prp""^(Win;), where a ranges over all strategy profiles of Q. 
Moreover, we define the value of Q for player i from v as the supremum of 
these values, i.e. valp(c) = sup^val^(c), where t ranges over all strategies 
of player i in Q. Intuitively, valp(c) is the maximal payoff that player i can 
ensure when the game starts from v. If C/ is a two-player zero-sum game, 
a celebrated theorem due to Martin ll20l states that the game is determined, 
i.e. valp = 1 — val^ (where the equality holds pointwise). The number 
va \^{v) := val^(z;) is consequently called the value of Q from v. 

Given an initial vertex vq E V , a strategy a of player ; in C/ is called 
optimal if val'^(z;o) = valp(co). A globally optimal strategy is a strategy that is 
optimal for every possible initial vertex vq eV . Note that optimal strategies 
do not need to exist since the supremum in the definition of valj' is not 
necessarily attained. However, if for every possible initial vertex there exists 
an optimal strategy, then there also exists a globally optimal strategy. 

Objectives. We have introduced objectives as abstract Borel sets of infinite 
sequences of vertices; to be amendable for algorithmic solutions, all objectives 
must be finitely representable. In verification, objectives are usually co-regular 
sets specified by formulae of the logic SIS (monadic second-order logic on 
infinite words) or LTL (linear-time temporal logic) referring to unary predi- 
cates Pc indexed by a finite set C of colours. These are interpreted as winning 
conditions in a game by considering a colouring : V — > C of the vertices in 
the game. Special cases are the following well-studied conditions: 

• Buchi (given by a set F C C): the set of all a e such that Inf(a) CiF ^ 
0. 

• co-BUchi (given by set F C C): the set of all a e C' such that Inf(a) C F. 

• Parity (given by a priority function f3 : C — > N): the set of all a E such 
that min(Inf(f3(a))) is even. 

• Streett (given by a set fl of pairs (F, G) where F, G C C): the set of all 
Oi e such that for all pairs (F, G) E O with Inf (a) n F 7^ it is the 
case that Inf (a) n G ^ 0. 

• Rabin (given by a set fl of pairs (F, G) where F, G C C): the set of all 
a eC^ such that there exists a pair (F, G) E O with Inf (a) n F 7^ but 
Inf(a) n G = 0. 
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• Muller (given by a family !F of sets F C C): the set of all a E C'' such 
that there exists F E J- with Inf (a) = F. 

Note that any Biichi condition is a parity condition with two priorities, that 
any parity condition is both a Streett and a Rabin condition, and that any 
Streett or Rabin condition is a Muller condition. (However, the translation 
from a set of Streett/ Rabin pairs to an equivalent family of accepting sets is, 
in general, exponential.) In fact, the intersection (union) of any two parity 
conditions is a Streett (Rabin) condition. Moreover, the complement of a Biichi 
(Streett) condition is a co-Biichi (Rabin) condition and vice versa, whereas 
the class of parity conditions and the class of Muller conditions are closed 
under complementation. Finally, note that any of the above condition is 
prefix-independent: for every a e and x e C*, a satisfies the condition iff 
xa does. 

Theoretically, parity and Rabin conditions provide the best balance of 
expressiveness and simplicity: On the one hand, any SMG where player i has 
a Rabin objective admits a globally optimal positional strategy for this player 
||4l . On the other hand, any SMG with 6i;-regular objectives can be reduced to 
an SMG with parity objectives using finite memory (see 123). An important 
consequence of this reduction is that there exist globally optimal finite-state 
strategies in every SMG with a;-regular objectives. In fact, there exist globally 
optimal pure strategies in every SMG with prefix-independent objectives [18l. 

In the following, for the sake of simplicity, we will only consider games 
where each vertex is coloured by itself, i.e. C — V and x ~ id. We would like 
to point out, however, that all our results remain valid for games with other 
colourings. For the same reason, we will usually not distinguish between a 
condition and its finite representation. 

Decision problems for two-player zero-sum games. The main computa- 
tional problem for two-player zero-sum games is computing the value (and 
optimal strategies for either player, if they exist). Rephrased as a decision 
problem, the problem looks as follows: 

Given a 2SG Q, an initial vertex Cg and a rational probability p, 
decide whether val^(co) ^ V- 

A special case of this problem arises for p = 1. Here, we only want to know 
whether player can win the game almost surely (in the limit). Let us call the 
former problem the quantitative and the latter problem the qualitative decision 
problem for ISGs. 

Table 1| summarises the results about the complexity of the quantitative 
and the qualitative decision problem for two-player zero-sum stochastic games 
depending on the type of player O's objective. For MDPs, both problems are 
decidable in polynomial time for aU of the aforementioned objectives (i.e. up 
to Muller conditions) liSnlSj. 
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Quantitative 


Qualitative 


(co-)Buchi 


NP n coNP il 


P-complete [14J 


Parity 


NP n coNP m 


NP n coNP 161 


Streett 


coNP-complete IlillTSlI 


coNP-complete IlillTSl 


Rabin 


NP-complete BlITSl 


NP-complete IlillTSlI 


Muller 


PSPACE-complete ll3llT9l 


PSPACE-complete ll3l[T9l 



Table 1. The complexity of deciding the value in 2SGs. 



3 Nash equilibria and their decision problems 

To capture rational behaviour of (selfish) players, John Nash | j2T| introduced 
the notion of, what is now called, a Nash equilibrium. Formally, given a strategy 
profile a in an SMG {Q, cq), a strategy t of player i is called a best response to a 
if T maximises the expected payoff of player i: Pr^jj""^ (Win,) < Fr'^^"'^ (Wmj) 
for all strategies t' of player i. A Nash equilibrium is a strategy profile 
= {^i)ien such that each cr,- is a best response to a. Hence, in a Nash 
equilibrium no player can improve her payoff by (imilaterally) switching to 
a different strategy. For two-player zero-sum games, a Nash equilibrium is 
nothing else than a pair of optimal strategies. 

Proposition 5. Let (^/, cq) be a two-player zero-sum game. A strategy profile 
(cr, t) of (Gr'^o) is a Nash equilibrium iff both cr and t are optimal. In particu- 
lar, every Nash equilibrium of {G,vo) has payoff (val^(co), 1 — val^(z;o)). 

Proof. (^) Assume that both a and t are optimal, but that {o',t) is not a 
Nash equilibrium. Hence, one of the players, say player 1, can improve 
her payoff by playing some strategy t'. Hence, val^(co) = Pry^^(Wino) > 
Pr^^^ (Wing). However, since u is optimal, it must also be the case that 
val^(co) < Pr^p^ (Wing), a contradiction. The reasoning in the case that 
player can improve is analogous. 

(<=) Let {cr, t) be a Nash equilibrium of {G, Vq), and let us first assvime 
that a is not optimal, i.e. vaY^{v(j) < val^(i'o)- By the definition of val^, 
there exists another strategy cr' of player such that val'^(co) < vaF (cq) ^ 
val^(co)- Moreover, since (u, t) is a Nash equilibrium: 

Pr^^^(Wino) < val^ivo) < val"' (vo) = inf^ Pr^;'^(Wino) < Pr^^'^(Wino) . 

Thus player can improve her payoff by playing a' instead of a, a contradic- 
tion to the fact that (c^, t) is a Nash equilibrium. Now, if we assume that t 
is not optimal, we can analogously show the existence of a strategy t' that 
player 1 can use to improve her payoff. q.e.d. 

So far, most research on finding Nash equilibria in infinite games has 
focused on computing some Nash equilibrium fTl. However, a game may 
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have several Nash equilibria with different payoffs, and one might not be 
interested in any Nash equilibrium but in one whose payoff fulfils certain 
requirements. For example, one might look for a Nash equilibrium where 
certain players win almost surely while certain others lose almost surely. This 
idea leads to the following decision problem, which we call NE[^ 

Given an SMG {G,'Vo) and thresholds x,y e [0, 1]^, decide whether 
there exists a Nash equilibrium of {G,vq) with payoff > x and < y. 

Of course, as a decision problem the problem only makes sense if the game 
and the thresholds x and y are represented in a finite way. In the following, 
we will therefore assume that the thresholds and all transition probabilities 
are rational, and that all objectives are a;-regular 

Note that NE puts no restriction on the type of strategies that realise the 
equilibrium. It is natural to restrict the search space to equilibria that are 
realised in pure, finite-state, stationary, or even positional strategies. Let us 
call the corresponding decision problems PureNE, FinNE, StatNE and PosNE, 
respectively. 

In a recent paper 1271 , we studied NE and its variants in the context of 
simple stochastic multiplayer games (SSMGs). These are SMGs where each 
player's objective is to reach a certain set T of terminal vertices: vA = {v} 
for each c G T. In particular, such objectives are both Biichi and co-Biichi 
conditions. Our main results on SSMGs can be summarised as follows: 

• PureNE and FinNE are undecidable; 

• StatNE is contained in PSPace, but NP- and SqrtSum-hard; 

• PosNE is NP-complete. 

In fact, PureNE and FinNE are imdecidable even if one restricts to instances 
where the thresholds are binary, but distinct, or if one restricts to instances 
where the thresholds coincide (but are not binary). Hence, the question 
arises what happens if the thresholds are binary and coincide. This question 
motivates the following qualitative version of NE, a problem which we call 
QualNE: 

Given an SMG {G,vq) and x E {0,1}^, decide whether {G,Vo) has a 
Nash equilibrium with payoff x. 

In this paper, we show that QualNE, StatNE and PosNE are decidable 
for games with arbitrary a;-regular objectives, and analyse the complexities 
of these problems depending on the type of the objectives. 

4 Stationary equilibria 

In this section, we analyse the complexity of the problems PosNE and StatNE. 
Lower bounds for these problems follow from our results on SSMGs ||27|. 

^In the definition of NE, the ordering < is applied componentwise. 
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Theorem 6. PosNE is NP-complete for SMGs with Biichi, co-Biichi, parity, 
Rabin, Streett, or Muller objectives. 

Proof. Hardness was aheady proven in [27]. To prove membership in NP, we 
give a nondeterministic polynomial-time algorithm for deciding PosNE. On 
input Q, vq, X, y, the algorithm simply guesses a positional strategy profile a 
(which is basically a mapping Uieir ~^ Next, the algorithm computes 
the payoff z, oiW for each player i by computing the probability of the event 
Win, in the Markov chain [Q"^ , v^j), which arises from Q by fixing all transitions 
according to a. Once each Z; is computed, the algorithm can easily check 
whether x, < z, < y,. To check whether is a Nash equilibrium, the algorithm 
needs to compute, for each player i, the value r, of the MDP {Q"^-', vq), which 
arises from Q by fixing all transitions but the ones leaving vertices controlled 
by player i according to a (and imposing the objective Win,). Clearly, C7 is a 
Nash equilibrium iff r, < z, for each player i. Since we can compute the value 
of any MDP (and thus any Markov chain) with one of the above objectives 
in pol5momial time ||3l [TSlI , all these checks can be carried out in polynomial 
time. Q.E.D. 

To prove the decidability of StatNE, we appeal to results established for 
the Existential Theory of the Reals, ExTh(5R), the set of all existential first-order 
sentences (over the appropriate signature) that hold in ?l := (R, +, ,0, 1, <). 
The best known upper bound for the complexity of the associated decision 
problem is PSPace JJI, which leads to the following theorem. 

Theorem 7. StatNE is in PSPace for SMGs with Biichi, co-Biichi, parity, Rabin, 
Streett, or Muller objective. 

Proof. Since PSPace = NPSpace, it suffices to provide a nondeterministic 
algorithm with polynomial space requirements for deciding StatNE. On 
input Q, Vq, X, y, where w.l.o.g. Q is an SMC with Muller objectives J-i G TY , 
the algorithm starts by guessing the support S C V x V of a stationary strategy 
profile a of Q, i.e. S = {{v,w) e V x V : a{w \ v) > 0}. From the set S alone, 
by standard graph algorithms (see ||3l lT3l ), one can compute (in polynomial 
time) for each player / the following sets: 

1. the union F; of all end components (i.e. bottom SCCs) C of the Markov 
chain Q'^ that are winning for player i, i.e. C E T^; 

2. the set R,- of vertices v such that Pr^(Reach(F,)) > 0; 

3. the imion T, of all end components of the MDP Q'^-' that are winning for 
player i. 

After computing all these sets, the algorithm evaluates an existential 
first-order sentence which can be computed in pol5momial time from Q , 
Cq, X, y, (Ri)i^ii' {Fi)ien ^rid {Ti)ien over D\ and returns the answer to this 
query. 
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It remains to describe a suitable sentence ip. Let a = {(tvw)v,iueVf f = 
{^v)ien,v£V arid z = {z\;)i£n,vev be three sets of variables, and let V* = 
Uisn ^ be the set of all non-stochastic vertices. The formula 

(p(a) := /\ ( /\ a„u,>0 A /\ ol^iv = A ^ = 1 j A 

A ^™ = Vviv A A "oit; > A A a^oif = , 

where p-oi^ is the unique number such that {v, pj,i„, w) E A, states that the 
mapping (7 : V — > ©(V) defined by (t{iv \ v) = a-aw constitutes a valid 
stationary strategy profile of Q whose support is S. Provided that <p(a) holds 
in yi, the formula 

t]i{a,z) := A 4 = 1 A A 4 = A A 4 = E '^^^"^w 

states that = Pr^(Win;) for each c G V, where a is defined as above. This 
follows from a well-known results about Markov chains, namely that the 
vector of the aforementioned probabilities is the unique solution of the given 
system of equations. Finally, the formula 

di{a,f) ■= A 4 > A A = 1 A A 4 > ''L A A <,= E 

vev veTi veVi v&v\Vi wevA 

zv£vA 

states that f is a solution of the linear programme for computing the maximal 
payoff that player i can achieve when playing against the strategy profile 
In particular, the formula is fulfilled if rl, = sup^Pr^'^""^'(Reach(T/)) = 
sup^ Prl,'^""^' ( Win, ) (where the latter equality follows from Lemmas [s] and |4]|, 
and every other solution is greater than this one (in each component). 

The desired sentence ip is the existential closure of the conjimction of 
cp and, for each player the formulae r/j and !?/ combined with formulae 
stating that player i cannot improve her payoff and that the expected payoff 
for player / lies in between the given thresholds: 

:= 3a 3r 3z {cp{dc) A A ('//("/Z) A !?/(a,r) A rj,^ < z[,^ A < z^^ < y,-)). 

ien 

It follows that ip holds in iff {Q,vq) has a stationary Nash equilibrium 
with payoff at least x and at most y whose support is S. Consequently, the 
algorithm is correct. q.e.d. 

5 Equilibria with a binary payoff 

In this section, we prove that QualNE is decidable. We start by characterising 
the existence of a Nash equilibrium with a binary payoff in any game with 
prefix-independent objectives. 
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5.1 Characterisation of existence 

For a subset U C V, we denote by Reach(!J) the set V* ■ U ■ V^; it U = {v}, 
we just write Reach(c) for Reach(Lr). Finally, given an SMG Q and a player i, 
we denote by V^^ the set of all vertices v £ V such that valf{v) > 0. The 
following lemma allows to infer the existence of a Nash equilibrium from the 
existence of a certain strategy profile. The proof uses so-called threat strategies 
(also known as trigger strategies), which are the basis of the folk theorems in the 
theory of repeated games (cf. [23 Chapter 8]). 

Lemma 8. Let a he a pure strategy profile of Q such that, for each player i, 
Pr^j,(Wini) = 1 or Pr^jj(Reach(y.>0)) = 0. Then there exists a pure Nash 
equilibrium a* with Pr^^^ = Pr^^^ . If, additionally, all winning conditions are co- 
regular and a is finite-state, then there exists a finite-state Nash equilibritim a* 
withPr|„ = Pr^;. 

Proof Consider the 2SG Gi = ( {f, n \ {f } }, V, Vi, [jj^j Vj, A, Win,-, \ Win,) 
where player i plays against the coalition J7 \ {;} of all other players. Since the 
set Win, is prefix-independent, there exists a globally optimal pure strategy t, 
for the coalition in this game. For each player j ^ i, this strategy induces 
a pure strategy , in Q. To simplify notation, we also define t, , to be an 
arbitrary finite-state strategy of player i in Q. Player i's strategy cr* in a* is 
defined as follows: 



ai{xv) if Pr^j,(xi;- V^') > 0' 
T),j(^2'^) otherwise. 



where, in the latter case, x = X\X2. with X\ being the longest prefix of xv 
such that Vx%^{xi ■ V^) > and j & H being the player that has deviated 
from a, i.e. xi ends in Vj; if xi is empty or ends in a stochastic vertex, we set 
j = i. Intuitively, a* behaves like cr, as long as no other player j deviates from 
playing aj, in which case cr* starts to behave like t,j. 

If each Win, is 6t;-regular, then t can be chosen to be a finite-state profile. 
Consequently, each Tjj can be assumed to be finite-state. If additionally a is 
finite-state, it is easy to see that the strategy profile a*, as defined above, is 
also finite-state. 

Note that Pr^* = Pr^^^. We claim that a* is a Nash equilibrium of {Q,vq). 
Let p be any strategy of player i in Q; we need to show that Prj,g^''''(Win, ) < 
P<(Win,-). 

Let us call a history xvw & V* ■ Vj ■ V a deviation history if Pr'^g{xv ■ V^') > 
0, but cri[xv) 7^ w and p{w \ xv) > 0; we denote the set of all deviation 
histories by X. 

Claim. Frl-''''{B \ X ■ V^) < Pr^g(B) for every Borel set B. 

Proof. The claim is obviously true for the basic open sets B = w - V^' (where w e 
V*) and thus also for finite, disjoint unions of such sets, which are precisely 
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the clopen sets (i.e. sets of the form W ■ for finite W C V*). Since the class 
of clopen sets is closed under complements and finite unions, by the monotone 
class theorem [17], the closure of the class of all clopen sets under taking limits 
of chains contains the smallest tr-algebra containing all clopen sets, which is 
just the Borel £7-algebra. Hence, it suffices to show that whenever we are given 
measurable sets Ai, A2, ■ . . C V" with Ai C A2 C . . . or Aj D A2 2 ■ ■ ■ 
such that the claim holds for each A„, then the claim also holds for lim„ A„, 
where lim„ A„ = UneN^n o^" lim„ A„ = ClneKS -^n, respectively. So assume 
that Ai, A2, ■ ■ ■ C 1/'" is a chain such that Pr^-''''(A„ \ X ■ V^') < Pr^g(A„) for 
each n e N. Clearly, (lim„ A„)\X -V"^ = lim„(A„ \ X ■ V"). Moreover, since 
measures are continuous from above and below: 

Pr;-''^lim(A„\X-y-)) 

= limPrJ''''(A„\X-y-) 

< limPr^^(A„) 

= Pr^ (limA„). q.e.d. 

u ^ 

As usual in probability theory, if P is a probability measure and A and 
B are measurable sets such that P{B) > 0, then we denote by P(A | B) the 
conditional probability of A given B, defined by P{A \ B) = ^p^gf ^ ■ 

Claim. Pr^-''^(Win,- | xviv ■ V") < valf {lu) for every xvzv E X. 

Proof. By the definition of the strategies Ty we have that Pr^, ^''^^^'''^(Win,) < 
val^(c) for every vertex v E V and every strategy p of player i. On the other 
hand, if xvw is a deviation history, then for each player j the residual strategy 
a* [xv] is equal to Ty , on histories that start in w. Hence, by 
since the set Win, is prefix-independent, we get: 

Pr^g-''''(Win,- I xvzv-V'') 
= Pr^o-'''''(Win,- n xvw ■ V") / Fr'C-''\xvw ■ V^) 
= Prf^'l^^l'^^'^'Vin/) 
= Pr!;-''^"^f^"''l(WinO 

< Valf (w) Q.E.D. 

Using the previous two claims, we can prove that Pr^g^''''(Win,) < 
Pr^^(Win,) = Pr^*(WinO as follows: 

PrJ"^Win,) 

= Pr^|^''''(Win,' \X-V")+Y^ PrJ''^(Win, n xvw ■ V^) 

xviv£X 

< Pr^g(WinO + J2 PW'''^(Win,' n xvw ■ V^') 

xvw£X 



Lemma 2 



and 
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= Pr^g(Win/) + J2 Pr^o"'''^(Win,- | xvw ■ V") ■ Pr'^,-'''' {xvzv ■ V") 

xvw£X 

xvwex 

< P4(Win,-) + ^ valf (z;) ■ K*-"''{xvw ■ V^) 

xvw£X 

= P4(Win,-), 

where the last equality follows from Prp^(Reach(y.^'^)) = 0, which implies 
that valp(c) = for each v E V such that Pr^^^ (Reach (c)) > 0. q.e.d. 

Finally, we can state the main result of this section. 

Proposition 9. Let {G,vo) be any SMG with prefix-independent winning 
conditions, and let x G {0, 1}^. Then the following statements are equivalent: 

1. There exists a Nash equilibrium with payoff x; 

2. There exists a strategy profile a with payoff x such that 
Pr^g(Reach(y>°)) = for each player i with x, = 0; 

3. There exists a pure strategy profile a with payoff x such that 
Pr^jj(Reach(y>°)) = for each player i with x,- = 0; 

4. There exists a pure Nash equilibrium with payoff x. 

If additionally all winning conditions are a;-regular, then any of the above 
statements is equivalent to each of the following statements: 

5. There exists a finite-state strategy profile a with payoff x such that 
Pr^jj(Reach(y>°)) = for each player i with = 0; 

6. There exists a finite-state Nash equilibrium with payoff x. 

Proof. (1. ^ 2.) Let tJ be a Nash equilibrium with payoff x. We claim that 
a is already the strategy profile we are looking for: Pr^^ ( Reach (y>0)) = 
for each player i with x, = 0. Towards a contradiction, assume that 
Pr^^(Reach(y.>0)) > for some player i with x, = 0. Since V is finite, there 
exists a vertex v E Vj^^ and a history x such that Pr^jj(xc ■ V^) > 0. Let 
T be an optimal strategy for player / in the game {G,v), and consider her 
strategy cr' defined by 



(7{yw) if XV ^ yw, 
T{y'w) otherwise. 



where, in the latter case, y = xy' . Clearly, 'Pv'^^{xv ■ V") = Pr-[,g "'^ \xv ■ V"). 
Moreover, Pr^p(Win; \xv-V'^) = Pr|,g-"'^'''(Win/ Xxv-V^): this follows from 



Lemma 1 by taking X = V* \xv -V* . Using Lemma 2 we can infer that 



PrJ;^-'''' ^(Win,) > as follows: 

Pr£-""'Vin,) 
= Vrf^-''"'^ (Win,- n XV ■ V") + Pr^^-'''''^ (Win,- \ xv ■ V^) 
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= FtZ-'''''\xv ■ V^) ■ Pr|,''-"'''^'''l(Wm,-) + P4(Wm,- \ xv ■ V") 
= Fv'^gixv ■ V^) ■ Prf-''''^''''(Wm,-) + Pr^g(Wm,- \ xv ■ V^) 
> Fv^^ixv ■ V^) ■ valf{v) + Pr^^(Win/ \ xv ■ V^) 
>0 

Hence, player / can improve her payoff by playing a' instead of dj, a contra- 
diction to the fact that is a Nash equilibrium. 

(2. 3.) Let (J be a strategy profile of cq) with payoff x such that 
Pr^j,(Reach(y>°)) = for each player i with x, = 0. Consider the MDP M 
that is obtained from Q by removing all vertices v E V such that v E Vf'^ for 
some player i with x, = 0, merging all players into one, and imposing the 
objective 

Win = fl Win/ n f| \ Win,- . 

ten ien 

X; = l X;=0 

The MDP Ai is well-defined since its domain is a subarena of Q. Moreover, 
the value val^^ {vq) of Ai is equal to 1 because the strategy profile a induces 
a strategy a in A4 satisfying Pr^^(WLn) = 1. Since each Win, is prefix- 
independent, so is the set Win. Hence, there exists a pure, optimal strategy t 
in {A4, vq). Since the value is 1, we have Pr^^^ (Win) — 1, and t induces a pure 
strategy profile of Q with the desired properties. 

(3. =^ 4.) Let £7 be a pure strategy profile of {G,vq) with payoff x such 

there 



Lemma 8 



that Pr^p(Reach(V,>°)) = for each player / with x, = 0. By 
exists a pure Nash equilibrium a* of (^, cq) with Pr^^^ = Pr^^^ . In particular, 
a* has payoff x. 

(4. ^ L) Trivial. 

Under the additional assumption that all winning conditions are OJ- 
regular, the implications (2. 5.) and (5. ^ 6.) are proven analogously; the 
implication (6. => 1.) is trivial. q.e.d. 



As an immediate consequence of Proposition 9 we can conclude that 



finite-state strategies are as powerful as arbitrary mixed strategies as far as the 
existence of a Nash equilibrium with a binary payoff in SMGs with a;-regular 
objectives is concerned. (This is not true for Nash equilibria with a non-binary 
payoff im.) 

Corollary 10. Let {G,vq) be any SMC with a;-regular objectives, and let 
X E {0,1}^. There exists a Nash equilibrium of (^,^10) with payoff x iff there 
exists a finite-state Nash equilibrium of {G,Vq) with payoff x. 



Proof. The claim follows from Proposition 9 and the fact that every SMG with 



CL'-regular objectives can be reduced to one with prefix-independent a;-regular 
(e.g. parity) objectives. q.e.d. 
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5.2 Computational complexity 

We can now describe an algorithm for deciding QualNE for games with 
Muller objectives. The algorithm relies on the characterisation we gave in 



Proposition 9 which allows to reduce the problem to a problem about a 
certain MDP. 

Formally, given an SMG G = {n,V,{Vi)i^n,A{^i)ien) with Muller 
objectives J^i C 2^, and a binary payoff x G {0, 1}^, we define the Markov 
decision process G{x) as follows: Let Z C V be the set of all v such that 
valf{v) = for each player i with x, = 0; the set of vertices of G{x) is 
precisely the set Z, with the set of vertices controlled by player being 
Zo := Z n U,en ^i- (If Z = 0, we define G{x) to be a trivial MDP with the 
empty set as its objective.) The transition relation of G{x) is the restriction 
of A to transitions between Z-states. Note that the transition relation of G (x) 
is well-defined since Z is a subarena of G- We say that a subset U C V has 
payoff X if Lf G J^, for each player i with x, = 1 and U ^ J-, for each player i 
with Xi = 0. The objective of ^(x) is Reach(T) where T C Z is the union of 
all end components U C Z that have payoff x. 

Lemma 11. Let {G, vq) be any SMG with MuUer objectives, and let x E {0, 1}^. 
Then {G, Cq) has a Nash equilibrium with payoff x iff val^^^^ {vq) = 1. 

Proof. (=>) Assume that {G,'^o) has a Nash equilibrium with payoff x. By 



Proposition 9 this implies that there exists a strategy profile o" of {G,'Vo) with 
payoff X such that Pr^g(Reach(y\Z)) = 0. We claim that Pr^|^(Reach(T)) = 1. 



Otherwise, by Lemma 3 there would exist an end component C C Z such 
that C ^ !Fi for some player i with X; = 1 or C G !Fi for some some player i 
with Xi = 0, and Pr|jj({a E : Inf(a) = C}) > 0. But then, a cannot have 
payoff X, a contradiction. Now, since Pryjj(Reach(y \ Z)) = 0, induces a 
strategy (7 in C/(x) such that Pr^g(B) = Pr^g(B) for every Borel set B C Z'^. In 
particular, Pr^^^ (Reach(T)) = 1 and hence val^^^\vo) = L 

Assume that val^^^^(co) = 1 (in particular, vq g Z), and let a be 



an optimal strategy in {G{x),vq). From a, using Lemma 4 we can devise 
a strategy a' such that Pr^^({a G : Inf(ft;) has payoff x}) = 1. Finally, 
cr' can can be extended to a strategy profile a ot G with payoff x such that 
Pr^^ (Reach(y \ Z) ) =0. By Proposition 9 this implies that (^,1^0) has a Nash 



equilibrium with payoff x. q.e.d. 

Since the value of an MDP with reachability objectives can be computed 
in pol3momial time (via linear programming, cf. Il24ll ), the difficult part lies 
in computing the MDP ^(x) from G and x (i.e. its domain Z and the target 
set T). 

Theorem 12. QualNE is in PSPace for games with Muller objectives. 

Proof. Since PSPace = NPSpace, it suffices to give a nondeterministic algo- 
rithm with pol5momial space requirements. On input G, vq, x, the algorithm 
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starts by computing for each player i with x, = the set of vertices v with 
va lf{v) = 0, which can be done in polynomial space (see Table lb. The 



intersection of these sets is the domain Z of the Markov decision process G{x). 
If vq is not contained in this intersection, the algorithm immediately rejects. 
Otherwise, the algorithm proceeds by guessing a set T' C Z and for each 
V & T' asetUv C Z with v & Uv If, for each v e T', the set Uv is an end com- 
ponent with payoff x, the algorithm proceeds by computing (in polynomial 
time) the value val^('^^(z;o) of the MDP g{x) with T' substituted for T and 
accepts if the value is 1. In all other cases, the algorithm rejects. 



The correctness of the algorithm follows from Lemma 11 and the fact 
that Pr^^jj(Reach(T')) < Prpjj(Reach(T)) for any strategy cr in Q{x) and any 
subset T' C T. q.e.d. 

Since any SMG with a;-regular can effectively be reduced to one with 
Muller objectives. Theorem 12| implies the decidability of QualNE for games 



with arbitrary o^-regular objectives (e.g. given by SIS formulae). Regarding 
games with Muller objectives, a matching PSPACE-hardness result appeared 
in IIT9II , where it was shown that the qualitative decision problem for 2SGs 
with Muller objectives is PSPACE-hard, even for games without stochastic 
vertices. However, this result relies on the use of arbitrary colourings. 

To solve QualNE for games with Streett objectives, we will make use of 
the following procedure StreettEC(U), which computes for a game Q with 
Streett objectives f3,, i £ J7, and a binary payoff x G {0, 1}^ the union of all 
end components with payoff x that are contained inU C V. 

procedure StreettEC(U) 
Z :=0 

Compute (in polynomial time) aU end components of G maximal in Lf 
for each such end component C do 

S := {ien : Xj = 1 and ex. (F, G) E s.th. C n F / and C n G = 0} 
R := {f e n : X/ = and (C n F = or C n G 7^ 0) for all (F, G) e D,} 
if S = R = then 

Z := ZUC 
else if S 7^ then 

y := C n n,-6s r\{F,G)&ni,cnG=0 C \ F 
Z := ZUStreettEC(Y) 
else if R 7^ and C n F 7^ for all (F, G) e Ot, i e R then 

y-cnn,eRn{F,G)en,c\G 

Z := ZUStreettEC(Y) 
end if 
end for 
return Z 
end procedure 

Note that on input Lf, StreettEC calls itself at most |Lr| times; hence, the 
procedure runs in polynomial time. Moreover, we can obtain a polynomial- 
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time procedure RabinEC that computes the same output for games with 
Rabin objectives flj by switching x, = and X; = 1 in the definitions of S 
and R. 

Theorem 13. QualNE is NP-complete for games with Streett objectives. 

Proof. Hardness was already proven in |26|. To prove membership in NP, we 
describe a nondeterministic, polynomial-time algorithm: On input Q, Vg, x, 
the algorithm starts by guessing a subarena Z' C V and, for each player i 
with Xj = 0, a positional strategy t, of the coalition rf \ {;} in the 2SG Qj, 



as defined in the proof of Lemma 8 In the next step, the algorithm checks 
(in polynomial time) whether val^' (c) = 1 for each vertex v E Z' and each 
player i with x, = 0. If not, the algorithm rejects immediately. Otherwise, 
the algorithm proceeds by calling the procedure StreettEC to determine the 
union T' of all end components with payoff x that are contained in S'. Finally, 
the algorithm computes (in polynomial time) the value val^(^'(z;o) of the 
MDP g{x) with Z' substituted for Z and T' substituted for T. If this value 
is 1, the algorithm accepts; otherwise, it rejects. 

It remains to show that the algorithm is correct: On the one hand, if 
(G, Co) has a Nash equilibrium with payoff x, then the run of the algorithm 
where it guesses Z' = Z and globally optimal positional strategies t, (which 
exist since in the games Qj the coalition has a Rabin objective) will be accepting 



since then T' = T and, by Lemma 11 val^^'"^' (i^g) = 1- On the other hand. 



in any accepting rim of the algorithm we have Z' C Z and T' C T, and 
the value that the algorith m computes cannot be higher than 

val^(-^'(z;o); 



hence, val^''^^(co) = 1/ and Lemma 11 guarantees the existence of a Nash 



equilibrium with payoff x. q.e.d. 

Theorem 14. QualNE is coNP-complete for games with Rabin objectives. 

Proof. Hardness is proven by a slight modification of the reduction for demon- 
strating NP-hardness of QualNE for games with Streett objectives (see the 
appendix). To show membership in coNP, we describe a nondeterministic, 
polynomial-time algorithm for the complement of QualNE. On input Q, Vq, x, 
the algorithm starts by guessing a subarena Z' C V and, for each player i 
with Xj = 0, a positional strategy u,- of player i in Q. In the next step, the algo- 
rithm checks whether for each vertex v E Z' there exists some player i with 
X, = and val'^' (c) > 0. If not, the algorithm rejects immediately. Otherwise, 
the algorithm proceeds by calling the procedure RabinEC to determine the 
union T' of all end components with payoff x that are contained in V \ Z'. 
Finally, the algorithm computes (in polynomial time) the value val^(^'(z;o) of 
the MDP Q{x) with V \ Z' substituted for Z and T' substituted for T. If this 
value is not 1, the algorithm accepts; otherwise, it rejects. 

The correctness of the algorithm is proven in a similar fashion as in the 
proof of the previous theorem. q.e.d. 
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Since any parity condition can be turned into both a Streett and a Rabin 
condition where the number of pairs is linear in the number of priorities, we 



can immediately infer from Theorems 13 and 14 that QualNE is in NP n coNP 
for games with parity objectives. 

Corollary 15. QualNE is in NP H coNP for games with parity objectives. 

It is a major open problem whether the qualitative (or even the quanti- 
tative) decision problem for 2SGs with parity objectives is in P. This would 
imply that QualNE is decidable in polynomial time for games with parity 
objectives since this would allow us to compute the domain of the MDP Q (x) 
in polynomial time. For each d G N, a class of games where the qualitative 
decision problem is provably in P is the class of all 2SGs with parity objectives 
that uses at most d priorities 0. For d = 2, this class includes all 2SGs with 
a Biichi or a co-Biichi objective (for player 0). Hence, we have the following 
theorem. 

Theorem 16. For each d E N, QualNE is in P for games with parity winning 
conditions that use at most d priorities. In particular, QualNE is in P for 
games with (co-)Biichi objectives. 



6 Conclusion 

We have analysed the complexity of deciding whether a stochastic multiplayer 
game with o^-regular objectives has a Nash equilibrium whose payoff falls into 
a certain interval. Specifically, we have isolated several decidable restrictions 
of the general problem that have a manageable complexity (PSPace at most). 
For instance, the complexity of the qualitative variant of NE is usually not 
higher than for the corresponding problem for two-player zero-sum games. 

Apart from settling the complexity of NE (where arbitrary mixed strate- 
gies are allowed), two directions for future work come to mind: First, one 
could study other restrictions of NE that might be decidable. For example, 
it seems plausible that the restriction of NE to games with two players is 
decidable. Second, it seems interesting to see whether our decidability results 
can be extended to more general models of games, e.g. concurrent games or 
games with infinitely many states like pushdown games. 
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Appendix 

Theorem. QualNE is coNP-hard for games with Rabin objectives. 

Proof. The proof is a variant of the proof for NP-hardness of the problem of 
deciding whether player has a winning strategy in a two-player zero-sum 
game with a Rabin objective flSl and by a reduction from the unsatisfiability 
problem for Boolean formulae. 

Given a Boolean formula q) in conjunctive normal form, we construct 
a two-player SMG Qq) without any stochastic vertex as follows: For each 
clause C the game Q(p has a vertex C, which is controlled by player 0, and for 
each literal X or occurring in cp there is a vertex X or ^X, respectively, 
which is controlled by player 1 . There are edges from a clause to each literal 
that occurs in this clause, and from a literal to every clause occurring in (p. 
Player I's objective is given by the single Rabin pair (V, 0), i.e. she always 
wins, whereas player O's objective consists of all Rabin pairs of the form 
({X},bX})and({^X},{X}). 

Obviously, Qq) can be constructed from q) in polynomial time. We claim 
that f is unsatisfiable if and only if {Q,p,C) has a Nash equilibrium with 
payoff (0, 1) (where C is an arbitrary clause). 

(^) Assume that cp is not satisfiable. We claim that player 1 has a 
strategy t to ensure that player O's objective is violated. Consequently, for any 
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strategy a of player 0, the strategy profile {a, x) is a Nash equilibrium with 
payoff (0, 1) . Otherwise, let cr be a positional optimal strategy for player 0. By 
determiriacy, this strategy ensures that player O's objective is satisfied. But a 
positional strategy a of player 1 chooses for each clause a literal contained in 
this clause. Since (p is unsatisfiable, there must exist a variable X and clauses 
C\ and C2 such that cr(Ci) — X and (r{C2) = ^X. Player 2's counter strategy 
is to play from X to C2 and from any other literal to C\. So the steategy a is 
not optimal, a confradiction. 

(<=) Assume that cp is satisfiable. Consider player I's positional sfrategy 
cr of playing from a clause to a literal that satisfies this clause. This ensures 
that for each variable X at most one of the literals X or -iX is visited infinitely 
often. The value of cr from any vertex is 1; hence, there can be no Nash 
equilibriimn. with payoff (0, 1). q.e.d. 
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